Splitting of Compound Terms in non-Prototypical Compounding Languages

نویسندگان

  • Elizaveta Clouet
  • Béatrice Daille
چکیده

Compounding is present in a large variety of languages in different proportions. Compound rate in the text obviously depends on the language, but also on the genre and the domain. Scientific and technical texts are especially conducive to compounding, even in the languages that are not traditionally admitted as highly compounding ones. In this article we address compound splitting of specialized terms. We propose a multi-lingual method of compound recognition and splitting, which uses corpus frequencies, lexical data and optionally linguistic rules. This is a supervised method which requires a small amount of segmented compounds as input. We evaluate the method on two languages that rarely serve as a material for automatic splitting systems: English and Russian. The results obtained are competitive with those of a state-of-the-art corpus-driven approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Compound Processing: Compound Splitting and Semantic Analysis for Afrikaans and Dutch

Compounding, the process of combining several simplex words into a complex whole, is a productive process in a wide range of languages. In particular, concatenative compounding, in which the components are “glued” together, leads to problems, for instance, in computational tools that rely on a predefined lexicon. Here we present the AuCoPro project, which focuses on compounding in the closely r...

متن کامل

Towards Unsupervised and Language-independent Compound Splitting using Inflectional Morphological Transformations

In this paper, we address the task of languageindependent, knowledge-lean and unsupervised compound splitting, which is an essential component for many natural language processing tasks such as machine translation. Previous methods on statistical compound splitting either include language-specific knowledge (e.g., linking elements) or rely on parallel data, which results in limited applicabilit...

متن کامل

Decompounding query keywords from compounding languages

Splitting compound words has proved to be useful in areas such as Machine Translation, Speech Recognition or Information Retrieval (IR). Furthermore, real-time IR systems (such as search engines) need to cope with noisy data, as user queries are sometimes written quickly and submitted without review. In this paper we apply a state-of-the-art procedure for German decompounding to other compoundi...

متن کامل

Evaluation of Microbial Contamination and Physico-Chemical Properties ‎of Compounding Drugs‌ ‌in Yazd Pharmacies

Aims: Any drug product made in the pharmacy, hospital or factory may be infection with microbes. This infection can be originated from raw materials or during manufacture of the product. It is also important to study the physical and chemical properties and stability of compound products. Materials & Method: In this study, a specific sample of a compound drug was ordered to 63 drugstores with ...

متن کامل

Compound terms and their constituent elements in information retrieval

Compounds, especially in languages where compounds are formed by concatenation without intervening whitespace between elements, pose challenges to simple text retrieval algorithms. Search queries that include compounds may not retrieve texts where elements of those compounds occur in uncompounded form; search queries that lack compounds will not retrieve texts where the salient elements are bur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014